Analytic optimization of Plasmodium falciparum marker gene haplotype recovery from amplicon deep sequencing of complex mixtures

Molecular epidemiologic studies of malaria parasites and other pathogens commonly employ amplicon deep sequencing (AmpSeq) of marker genes derived from dried blood spots (DBS) to answer public health questions related to topics such as transmission and drug resistance. As these methods are increasingly employed to inform direct public health action, it is important to rigorously evaluate the risk of false positive and false negative haplotypes derived from clinically-relevant sample types. We performed a control experiment evaluating haplotype recovery from AmpSeq of 5 marker genes (ama1, csp, msp7, sera2, and trap) from DBS containing mixtures of DNA from 1 to 10 known P. falciparum reference strains across 3 parasite densities in triplicate (n = 270 samples). While false positive haplotypes were present across all parasite densities and mixtures, we optimized censoring criteria to remove 83% (148/179) of false positives while removing only 8% (67/859) of true positives. Post-censoring, the median pairwise Jaccard distance between replicates was 0.83. We failed to recover 35% (477/1365) of haplotypes expected to be present in the sample. Haplotypes were more likely to be missed in low-density samples with <1.5 genomes/μL (OR: 3.88, CI: 1.82–8.27, vs. high-density samples with ≥75 genomes/μL) and in samples with lower read depth (OR per 10,000 reads: 0.61, CI: 0.54–0.69). Furthermore, minority haplotypes within a sample were more likely to be missed than dominant haplotypes (OR per 0.01 increase in proportion: 0.96, CI: 0.96–0.97). Finally, in clinical samples the percent concordance across markers for multiplicity of infection ranged from 40%-80%. Taken together, our observations indicate that, with sufficient read depth, the majority of haplotypes can be successfully recovered from DBS while limiting the false positive rate.


Introduction
Malaria parasite surveillance and molecular epidemiologic studies increasingly employ as a genotyping approach amplicon deep sequencing (AmpSeq) of short polymorphic fragments of parasite DNA to identify haplotypes present in a sample.Depending on which segments of the genome are sequenced, this approach returns haplotypes that can be used to estimate complexity of infection [1], investigate transmission between hosts [2,3], evaluate the prevalence and incidence of markers of drug resistance [4][5][6], and classify recurrent infections following drug treatment as reinfections or recrudescences [7,8].Similarly, these methods are used in molecular epidemiologic studies of viral and bacterial pathogens [9].As a result of these diverse use cases of AmpSeq, there is a broad need for practical and empirically-derived approaches to maximize haplotype recovery and mitigate the risks of false genotypes.
Prior groups have evaluated the accuracy of haplotype recovery from mixtures containing DNA from known P. falciparum strains across a range of available tools and parameters, and reported that strains present in low proportions are likely to be missed [2,10] and that false positive haplotypes often have lower read depth [11,12].In a large analysis of complex mixtures of up to five reference strains, recovery of two markers was compared using four haplotype calling tools [12].They found that fewer haplotypes are recovered from samples with less P. falciparum DNA, that haplotypes with a lower read count were more frequently false positives, and that the four different haplotype calling tools performed similarly.What remains unexplored by these prior reports are investigations of haplotype recovery from samples with three key features of field studies: i) prepared and processed as dried blood spots (DBS), ii) present across a range of densities reflective of infections that are typically observed in field studies, and iii) harboring genomes from a large range of P. falciparum strains, which in natural infections can exceed 15 [2].
We evaluated the accuracy of the recovery of diverse P. falciparum haplotypes from DBS harboring simple and complex mixtures of parasite genomes.To do so, we prepared mixtures of up to 10 parasite strains at known proportions and across three parasite density categories, and amplified and sequenced each in triplicate with MiSeq across polymorphic segments of five distinct markers (ama1, csp, msp7, sera2, and trap).With these reads, we employed an existing tool for haplotype inference to investigate the influence of parasite density, genomic complexity, and haplotype censoring criteria on the removal of false positive haplotypes, the sensitivity and precision of haplotype discovery, inter-replicate variability, and the ability to recover expected haplotypes at each locus.
For each reference strain and marker (n = 50), Unipro UGENE v42 [14] was used to map forward and reverse reads from Sanger sequencing to the respective marker gene.The trimming quality threshold and mapping minimum similarity were set to zero.The sequences

PLOS GLOBAL PUBLIC HEALTH
were manually trimmed and, where discrepancies in base calls were observed between forward and reverse reads, bases were called manually.Where possible, the Sanger sequences were validated against publicly-available sequences.These sequences were defined as the true reference sequence for each strain, and this the reference strain haplotypes (n = 5 per strain, 1 for each marker).
Five mock polygenomic infections and a 3D7-only mock infection were created by making control mixtures that combined 1 ng/μl gDNA stocks of the distinct parasite reference strains in known percentages ranging from 1% to 100% (Fig 1A).Each control mixture was serially diluted in uninfected whole blood, and dried blood spots (DBSs) were made for each of the 11 dilutions per mixture.DBS were singly punched into individual wells of a deep 96-well plate, and a modified Chelex-100 protocol [3] was used to make gDNA extracts.These were then tested in duplicate with a duplex pfr364/human β-tubulin quantitative PCR (qPCR) assay that estimated parasite densities using a standard curve generated with extracts from control DBS at dilutions of P. falciparum 3D7 ranging from 0.1 to 2000 parasites/μL of whole blood [15].Control mixture extracts were assigned to one of three parasite density ranges (low, <1.5 genomes/μl; medium, 1.5-75 genomes/μl; and high, �75 genomes/μl) and pooled by mixture at each density range for a total 18 pools (6 mixtures x 3 densities) to be used as templates for subsequent PCR amplification.

Library preparation and sequencing
Each mixture template was prepared for sequencing according to qPCR Ct-value as described in [3].Then, from each mixture template, we amplified at the target segments of ama1, csp, msp7, sera2, and trap in individual reactions in triplicate using a nested PCR strategy.Library preparation for sequencing followed described methods [16] with the following exceptions: PCR1 reactions included 300nM of each primer and 7 μl of template gDNA when extract Ct was < 28 (high density), 18 μl when 28 � Ct < 34 (medium density), and 15 μl concentrated extract when Ct � 34 (low density).PCR 1 cycling conditions were 95C x 3 0 !(98C reactions included 2 μl of template when gDNA pool extract Ct was < 28, and 8 μl of template when Ct was � 28.The resulting dual-indexed libraries were then pooled and purified as previously described [16] before sequencing on an Illumina MiSeq (v3 300PE) platform.Raw sequences have been deposited under BioProject PRJNA1008913.

Haplotype recovery
We used Snakemake v 7.20.0[17] to build an integrated pipeline for haplotype recovery, BRAVA (Basic and Rigorous Amplicon Variant Analyzer; https://github.com/duke-malariacollaboratory/BRAVA) in order to trim, filter, and map reads, and thence call haplotypes.Primers and adapters of amplicon deep sequencing reads for each marker were removed using Cutadapt v4.1 [18].These reads were trimmed using Trimmomatic v0.38 [19]; this removed the leading and trailing bases below a Phred quality score of 10, removed all nucleotides from the 3' end after the quality of the read falls below an average Phred quality score of 15 over a sliding window of 4 nucleotides, and dropped reads with fewer than 80 nucleotides.Remaining reads were mapped to the 3D7 reference genome using BBmap v39.01 [20].In a sensitivity analysis, the mapping was repeated using the HB3 reference genome.Reads were then further filtered and trimmed using the R package DADA2 v1.20.0 [21] function filterAndTrim with a maximum number of expected errors (maxEE) equal to 1. Values ranging from 2 to 10 were tested for the truncQ parameter in filterAndTrim, which truncates reads at the first instance of a quality score �truncQ.The optimal value was determined to be the value that maximized the number of reads used for haplotype calling [10]; the haplotypes that were output when using this value of truncQ were used for all subsequent analyses.Next, the learnErrors function was used to learn error rates, the dada function was used to remove sequencing errors and identify haplotypes, and the removeBimeraDenovo function was used to remove chimeras.All haplotypes returned by DADA2 were included for analysis.

Categorization of haplotypes
We define a haplotype as a unique sequence returned by DADA2 (as described above).For each locus in each sample, we further categorized each haplotype returned by DADA2 into one of three groups: 1. Expected haplotype: A haplotype with an identical sequence to that of a template sequence (reference haplotype) expected to be observed in the sequenced library.These were considered true positive haplotypes.

Haplotype arising from systematic error:
A haplotype with a sequence or read depth that we did not expect to observe in the sequenced library, but which was observed across all three replicates for at least one density bin.These were suspected to be truly present owing to either inadvertent introduction to mixtures during gDNA preparation or the presence of multiple haplotypes in the original source gDNA.In most cases, the sequence of these haplotypes was that of a reference strain that was not expected to be present in the mixture, supporting the former over the latter hypothesis.Haplotypes arising from systematic error were removed from the analysis prior to screening for optimal thresholds for haplotype censoring, as we suspected that these template strains were truly present in the library that was sequenced and therefore shouldn't be expected to be corrected by applying filtering criteria.
3. Haplotype arising from random error: A haplotype that we did not expect to observe in the sequenced library, and that was not consistently present across replicates for any mixturedensity combination.These were considered false positive haplotypes.

Identification of optimal thresholds for haplotype censoring
We evaluated the efficacy of four common metrics used to censor haplotypes: i) the depth of reads within a sample supporting a haplotype (read depth), ii) the proportion of reads within a sample supporting a haplotype (read proportion), iii) the ratio of abundances of pairs of haplotypes within a sample with a Hamming distance of one (read ratio), and iv) the length difference of the returned haplotype relative to that of the expected reference strain (length difference).As mentioned above, haplotypes arising from systematic error were removed prior to evaluating these criteria.All reference strain haplotypes for all loci were identical in length to the 3D7 haplotype, except one msp7 haplotype that was 3 base pairs shorter.Thus, we defined this censoring criterion as follows: the difference in length between the observed haplotype and the 3D7 reference haplotype must be equal to 0, -3, or 3 (i.e. one codon may be inserted or deleted).For the other 3 censoring criteria, we used Youden's J statistic to identify optimal thresholds across all possible thresholds of the criterion and corresponding confidence intervals with the coords and ci.coords functions from the R package pROC v1.18.0 [22].Because the importance of retaining true positive haplotypes vs. removing false positive haplotypes varies depending on the use case, this statistic was computed using three different ways to weight false negative vs. false positive classifications: equal weight to false negatives and false positives, 2x the weight to false negatives, and 2x the weight to false positives.To evaluate censoring criteria, we used the optimal criteria based on false negatives having 2x the cost of false positives.

Risk factor analysis for missing haplotypes
In order to identify what factors were associated with the failure to recover a haplotype from a mixture, we performed a bivariate and multivariate logistic regression of risk factors for haplotype missingness in R using the glmer function in lme4 v1.1.32[23].Missing haplotypes were defined as those that were not observed in the sample prior to the application of any haplotype censoring criteria.The outcome was the presence or absence of the haplotype in the un-censored haplotypes, and risk factors were target, starting proportion of the reference template strain, read depth (per 10,000 reads), parasite density, and expected number of distinct haplotypes present in the sample.A random intercept was included for each mixture-density combination.Low-density mix C samples were excluded from this analysis as they exhibited signatures of contamination from a high-density sample.

Clinical sample analysis
Ten P. falciparum-positive DBS collected in a field study in Webuye, Kenya that were previously sequenced at the ama1 and csp loci [2] were sequenced at the msp7, sera2, and trap loci.These samples were selected from those that were high-density and had MOIs >1 at both ama1 and csp loci (using previously defined haplotype calls and censoring criteria [2]).Haplotypes for newly sequenced loci were called with the pipeline described above, using the same method as for ama1 and csp.All haplotypes were censored using the identified optimal censoring criteria.

Ethical statement
The field study in which the clinical samples were collected was approved by institutional review boards of Moi University (2017/36) and Duke University (Pro00082000).All participants or guardians provided written informed consent, and those over age 8 years provided additional assent.Enrollment began 1 July 2017, and as an open cohort, is ongoing.
Code and data to recreate the analyses and figures in this manuscript can be found at https:// github.com/duke-malaria-collaboratory/haplotype_recovery_experiment.

Mixtures, reference strains, deep sequencing, and haplotype calling
We sequenced five previously-developed AmpSeq marker genes: ama1, csp, msp7, sera2, and trap (Table 2), and generated for sequencing 6 mock infections harboring mixtures of gDNA from between 1 and 10 distinct parasite reference strains (Fig 1A ) to approximate the polygenomic nature of many infections in high-transmission areas.Not all marker genes were unique to a strain; a total of 37 distinct haplotypes were present across the 10 strains and 5 markers.For each marker gene, each of the 6 mixtures was sequenced from dilution pools corresponding to low (<1.5 genomes/μL), medium (1.5-75 genomes/μL) and high (�75 genomes/ μL) parasite density bins in triplicate, tallying to 1365 expected haplotype occurrences across 270 sequenced samples.We obtained analyzable reads for 257/270 samples, with differences in the absolute yield of read counts between low (4.3 million), medium (8.0 million), and high (10.2 million) density samples.This general observation held for each individual marker, save for trap and msp7 which returned moderate read amounts irrespective of parasite density bin (Fig 1C).Overall, we observed across the five loci and 257 samples 1292 haplotype occurrences (Fig 1D ), for which the median read depth was 1542.The haplotypes returned were identical when reads were mapped to either the 3D7 or to the HB3 reference genome.

False positive haplotypes
We first investigated false positive haplotype occurrences across samples.Within each sample, we categorized each observed haplotype as expected to be present in the sample (true positive, n = 859/1292, 66%), likely cryptically present in the original mixture (systematic error; n = 254/1292, 20%), or likely arising from random error (false positive, n = 179/1292, 14%) (Fig 2A).Only 1% of reads that passed filtering supported haplotypes that were categorized as false positives.We observed this trend of proportionately few reads supporting proportionately more false positive haplotypes across both markers and parasite density bins (Fig 2B).Furthermore, the percentage of false positive haplotypes was relatively similar across parasite density bin (12-16%), although for ama1 and sera2, there were fewer false positive haplotypes for lowdensity templates (

Inter-replicate variability
To evaluate the consistency with which haplotypes were returned, we measured inter-replicate variability post-censoring.Overall, 58% of haplotypes were observed in all 3 replicates, 18% in 2 replicates, and 24% in 1 replicate.Haplotypes were more consistently returned in all three replicates for high-density samples (76% of the time) compared to medium-(61% of the time) and low-density samples (30% of the time) (Fig 4A).Consistent with this, in high-density samples Jaccard distances between replicates were higher (median = 1, IQR = 0.2) compared to medium-(median = 0.83, IQR = 0.5) and low-density samples (median = 0.5, IQR = 0.75) (Fig 4B).

Missing haplotypes
Of the 1365 haplotype occurrences expected to be present across all samples, we did not recover 477 (35%).Thus, we next investigated factors associated with missing haplotypes.As expected, haplotype proportion within a sample was inversely associated with missingness, with each increase of 0.01 in proportion associated with a 4% reduction in the likelihood of being missed (OR: 0.96, 95% CI: 0.96-0.97),even when controlling for marker, density bin, number of reads in the sample, and expected number of haplotypes (Fig 5A ; Table 3).Additionally, for all markers except trap, <15% of haplotypes were missed from high-density

Estimating multiplicity of infection based on marker haplotype diversity
We next compared the expected multiplicity of infection (MOI) to the observed MOI after censoring, with MOI expressed as the number of haplotypes observed at each individual We performed a similar comparison using 10 high-density P. falciparum infections collected as DBS through a recent field study in Western Kenya, in order to capture a broader naturallyoccurring diversity of marker haplotypes [2].Using the optimal censoring criteria defined  PLOS GLOBAL PUBLIC HEALTH above, we observed 142 haplotypes across all samples and markers, of which 36 (25%) were censored.The range of MOIs was 1-5 for each marker.No marker consistently estimated the highest or lowest MOI, and the percent concordance ranged from 40% to 80% (Fig 6B ).

Discussion
AmpSeq is an increasingly popular tool for molecular epidemiologic studies of various pathogens including P. falciparum collected on DBS, which necessitates rigorous haplotype recovery from field samples.We prepared DBS containing mixtures of gDNA from reference P. falciparum strains, amplified and sequenced polymorphic segments of 5 common marker genes in triplicate, and quantified the performance of haplotype recovery using a range of metrics.We observed that high sample read depth was associated with enhanced recovery of most haplotypes present in the original sample, and that censoring criteria based on read depth, read proportion, read ratio, and haplotype length can effectively remove most false positive haplotypes while retaining most true positive haplotypes.Thus, for use-cases which involve high-density samples or samples sequenced at high read depth, rigorous recovery can be achieved for multiple markers.Consistent with prior studies [2,10,12], we observed that the likelihood of haplotype recovery is enhanced by higher parasite density and by a larger proportion of an individual haplotype within a mixture.In particular, the consistency with which we observed haplotypes across replicates was higher in high-density samples compared to low-density samples.However, we further observed that, independent of parasite density and reference haplotype proportion, successful haplotype recovery was further associated with a higher overall sample read depth.The ability to recover haplotypes constituting a minority population within a parasitemia with an overall low density is an important goal for many use cases of AmpSeq.Namely, therapeutic efficacy studies of antimalarials use active case detection to screen for recurrence of parasites, and frequently capture low-density infections with multiple strains which must then be compared to those in the initial infection in order to distinguish reinfection from recrudescent infection [7].Additionally, studies of transmission networks in highly endemic settings in which low-density, asymptomatic infections predominate also benefit from comprehensive profiling of strains within mixtures in order to ascertain parasite relatedness between hosts [2].In these and similar use cases, the likelihood of detecting minority haplotypes can be improved by maximizing per-sample read depth, such as by limiting multiplexing and selecting maximal sequencing platform output.
We observed very different optimal censoring thresholds depending on how we weighted the relative importance of false positive and false negative haplotypes, which highlights the need to select censoring criteria suitable for the primary study objective.Penalizing false negative haplotypes more than false positive haplotypes yielded haplotype censoring criteria that still managed to remove most false positive haplotypes while retaining high sensitivity.Furthermore, these criteria were consistent with thresholds that others have used and reported in the literature (read depth: 204-420, read proportion: 0.005-0.014,read ratio: 0.09-0.36)[2,12].
We observed inconsistency in performance between markers with respect to false positives, censoring, missingness, and MOI.Pre-censoring, false-positive haplotypes were rarely recovered for msp7 but common for ama1 and trap.However, post-censoring the number of false positives was relatively low for all markers but trap.Fewer haplotypes were recovered for sera2 and trap overall.Furthermore, there was no consistent trend across a limited set of clinical samples of marker-specific MOI, suggesting that MOI estimates based upon a single marker may frequently underestimate the true MOI of a sample, as previously described [12].Since most markers returned largely correct haplotype calls across a range of mixtures PLOS GLOBAL PUBLIC HEALTH and parasite density bins, choice of marker may depend not only on marker performance but also other factors such as the biological question of interest (e.g.transmission, vaccine development, etc.).
Despite controlled laboratory conditions, we observed signatures of both systematic and random error.Systematic error may have resulted from two different sources.First, it is possible that multiple haplotypes were present in the original template strains and were missed during Sanger sequencing, a limitation of this sequencing method.Second, systematic error could arise from contamination during gDNA extraction.Owing to the high-throughput manner of DBS processing, using 96-well plates, it is unfortunate but expected that we observe contamination in a small minority of samples included in a sequencing run.This highlights the importance of meticulous laboratory work and thoughtful controls, particularly because these haplotypes are less likely to be removed by censoring criteria owing to their presence in the original template.In contrast, random error may arise due to PCR stochasticity and polymerase error in low-input next-generation sequencing libraries [41].This is also inevitable, and the censoring criteria described here successfully removed many haplotypes arising from these technical errors.
Our study had several limitations.First, we created the mixtures from gDNA rather than from intracellular DNA; therefore, the composition of the solution from which DNA was amplified was slightly less complex than that from clinical samples.However, as we extracted DNA from DBS, our results provide a closer approximation to clinical samples than previous studies.Second, we did not attempt to censor haplotypes arising from systematic error because the commonly used censoring criteria assessed here assume that false positive haplotypes arise from random rather than systematic error.Third, this study focused on in silico recovery of haplotypes, and replicates were drawn from the same gDNA extract pools.Thus, variability occurring due to extraction is not accounted for in these data.However, our results provide useful insight into variation and random errors occurring at the amplification and sequencing steps.

Conclusions
We observed that P. falciparum haplotypes from multiple different targets can be successfully recovered from DBS, that in the majority of cases these haplotypes are recovered across replicates, and that censoring criteria already used by the community remove most false positive haplotypes while retaining high sensitivity.These observations can be used to guide analysis and interpretation both of P falciparum haplotypes recovered from DBS but also of other pathogens who share with malaria parasites high genetic variability, multiplicity of strains, and informative genetic markers.

Fig 1 .
Fig 1. Overview of mixtures, reference strains, and sequence yield.(A) Overview of mixtures A through F, each composed of various proportions of gDNA from the listed P. falciparum reference strains (colors).(B) Pairwise single nucleotide variant (SNV) distances between reference haplotypes of each of the marker genes obtained by Sanger sequencing.(C) Number of reads in each sample by parasite density bin, faceted by marker gene.(D) Total number of pre-censored haplotype occurrences for each marker across all mixtures and replicates, colored by parasite density bin.Note that ama1 was sequenced separately from the other markers so read depth cannot be directly compared between ama1 and other markers.g/μL = genomes/μL.https://doi.org/10.1371/journal.pgph.0002361.g001 Fig 2C).False positive haplotypes were often not the correct sequence length, were often only one nucleotide different from a reference sequence in the sample (Fig 2C), and had lower read depths than haplotypes we expected to observe (median = 104 vs. 2393, Wilcox p < 0.001; Fig 2D).

Fig 2 .
Fig 2. Overview of false positive haplotypes.(A) Sample-level haplotype overview.Stacked boxes in each column represent observed haplotypes from reads that passed filtering, categorized as those expected in the reference (gray), arising from systematic error (pink), or from random error (red).Box heights indicate the number of reads supporting the haplotype.(B) Proportion of reads and of haplotypes by marker categorized as expected reference, systematic error, and random error.(C) False positive haplotypes by marker categorized by unexpected length and by SNV distance to the 3D7 reference sequence.Hamming distances were only computed for haplotypes identical in length to the 3D7 reference sequence.(D) Read depth for expected (gray), systematic error (pink) and random error (red) haplotypes by parasite density bin and by marker.g/μL = genomes/μL.https://doi.org/10.1371/journal.pgph.0002361.g002

Fig 3 .Fig 4 .
Fig 3. Optimization and application of censoring criteria.(A-C) Sensitivity and specificity across ranges of tested thresholds for haplotype (A) read depth, (B) read proportion, and (C) ratio within a sample between haplotypes with a Hamming distance of 1. (D) Count of censored haplotypes by the criterion by which they were censored and by density of parasites in DBS sample.The majority of censored haplotypes were non-reference haplotypes and fell under the identified read depth threshold.(E) Numbers of censored and uncensored haplotypes by haplotype category and by marker.(F) Proportion of uncensored (light grey) and censored (dark grey) haplotypes likely arising from random error, by parasite density bin and marker.(G) Reference haplotype percent of censored haplotypes.No reference haplotypes were censored out for trap.g/μL = genomes/μL.FN = false negative; FP = false positive.g/μL = genomes/μL.https://doi.org/10.1371/journal.pgph.0002361.g003

Fig 5 .
Fig 5. Summary of missing haplotypes.(A) Numbers of missing haplotypes (light grey), observed but censored haplotypes (medium grey), and observed haplotypes (dark grey) in individual samples by marker and parasite density bin.The number in each facet indicates the percentage of missing haplotypes.All subsequent panels in this figure consider observed but censored haplotypes as missing.(B) Correlation between the overall read depth of a sample and proportion of all expected haplotypes within a mixture that were not successfully recovered.Color indicates marker, and shape indicates parasite density.Spearman's rho = -0.62.(C) Correlation between proportions of expected and observed haplotypes within individual samples by parasite density bin, colored by marker.(D) Number of replicates in which the haplotype was found by binned strain percent in the original mixture (present at <10% or �10%).Each point is a haplotype colored by marker.The grey color beneath the points indicates the percent of haplotypes across all targets and mixtures in a given strain percent bin that were observed in the corresponding number of replicates.Low-density mix C samples were excluded from this figure as they exhibited signatures of contamination from a high-density sample.g/μL = genomes/μL.https://doi.org/10.1371/journal.pgph.0002361.g005

Fig 6 .
Fig 6.Estimated multiplicities of infection (MOIs) based on each marker haplotype.(A) Observed minus expected MOIs for mixtures post-censoring.(B) Estimated MOIs in clinical samples.Haplotypes were censored according to the optimal criteria identified above, giving false negatives 2x the cost of false positives.Not all trap clinical samples returned sequences.Percent concordance was computed for each sample as the percentage of markers for which the estimated MOI was equal to the mode MOI.https://doi.org/10.1371/journal.pgph.0002361.g006